RSR 🧮: Efficient Matrix Multiplication for Accelerating Inference in Binary and Ternary Neural Networks
This project aims to provide a fast and efficient approach to low-bit matrix multiplication.
The code repository implements Redundant Segment Reduction (RSR), a fast matrix multiplication algorithm designed for matrices in binary and ternary networks.
The RSR method optimizes computation efficiency by a log(n) factor, making it particularly useful for applications in low-bit deep learning and efficient inference.
The codebase provides ready-to-use
C++ and
NumPy-based implementations, as well as
PyTorch implementations with both
CPU and
GPU support, enabling scalable and optimized matrix operations in deep learning environments.
It includes sample experiments on various `1.58bit` models and LLMs.✨